Variance-penalized Markov decision processes: dynamic programming and reinforcement learning techniques

نویسنده

Abhijit Gosavi

چکیده

In control systems theory, the Markov decision process (MDP) is a widely used optimization model involving selection of the optimal action in each state visited by a discrete-event system driven by Markov chains. The classical MDP model is suitable for an agent/decision-maker interested in maximizing expected revenues, but does not account for minimizing variability in the revenues. An MDP model in which the agent can maximize the revenues while simultaneously controlling the variance in the revenues is proposed. This work is rooted in machine learning/neural network concepts, where updating is based on system feedback and step sizes. First a Bellman equation for the problem is proposed. Thereafter, convergent dynamic programming and reinforcement learning techniques for solving the MDP are provided along with encouraging numerical results on a small MDP and a preventive maintenance problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Target-sensitive control of Markov and semi-Markov processes

We develop the theory for Markov and semi-Markov control using dynamic programming and reinforcement learning in which a form of semi-variance which computes the variability of rewards below a pre-specified target is penalized. The objective is to optimize a function of the rewards and risk where risk is penalized. Penalizing variance, which is popular in the literature, has some drawbacks that...

متن کامل

2D1431 Machine Learning Lab 3: Reinforcement Learning

In this lab you will learn about dynamic programming and reinforcement learning. It is assumed that you are familiar with the basic concepts of reinforcement learning and that you have read chapter 13 in the course bookMachine Learning (Mitchell, 1997). The first four chapters of the survey on reinforcement learning by Kaelbling et al. (1996) is a good supplementary material. For further readin...

متن کامل

2D1431 Machine Learning Lab 4: Reinforcement Learning

In this lab you will learn about dynamic programming and reinforcement learning. It is assumed that you are familiar with the basic concepts of reinforcement learning and that you have read chapter 13 in the course book Machine Learning (Mitchell, 1997). The first four chapters of the survey on reinforcement learning by Kaelbling et al. (1996) is a good supplementary material. For further readi...

متن کامل

Solving Hidden-Mode Markov Decision Problems

Hidden-Mode Markov decision processes (HM-MDPs) are a novel mathematical framework for a subclass of nonstationary reinforcement learning problems where environment dynamics change over time according to a Markov process. HM-MDPs are a special case of partially observable Markov decision processes (POMDPs), and therefore nonstationary problems of this type can in principle be addressed indirect...

متن کامل

A New Learning Algorithm for Optimal Stopping

A linear programming formulation of the optimal stopping problem for Markov decision processes is approximated using linear function approximation. Using this formulation, a reinforcement learning scheme based on a primal-dual method and incorporating a sampling device called ‘split sampling’ is proposed and analyzed. An illustrative example from option pricing is also included.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Int. J. General Systems

دوره 43 شماره

صفحات -

تاریخ انتشار 2014

Variance-penalized Markov decision processes: dynamic programming and reinforcement learning techniques

نویسنده

چکیده

منابع مشابه

Target-sensitive control of Markov and semi-Markov processes

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 4: Reinforcement Learning

Solving Hidden-Mode Markov Decision Problems

A New Learning Algorithm for Optimal Stopping

عنوان ژورنال:

اشتراک گذاری